# Long video understanding
Eagle2.5 8B
Other
Eagle 2.5 is a cutting-edge vision-language model (VLM) designed for long-context multimodal learning, supporting the processing of video sequences up to 512 frames and high-resolution images.
Text-to-Image
Transformers Other

E
nvidia
2,626
8
Llavaction 0.5B
LLaVAction is a multimodal large language model for action recognition, based on the Qwen2 language model, trained on the EPIC-KITCHENS-100-MQA dataset.
Video-to-Text
Transformers English

L
MLAdaptiveIntelligence
215
1
Timesformer Base Finetuned K600
TimeSformer is a video classification model based on spatio-temporal attention mechanisms, fine-tuned on the Kinetics-600 dataset.
Video Processing
Transformers

T
fcakyon
20
0
Featured Recommended AI Models